Skip to main content

All Questions

0votes
1answer
40views

SVC labels entire sample majority class, even after using ADASYN

I have an imbalanced sample (850 in group X vs 100 in group Y). I am trying to predict group membership using support vector classifcation. I am using 'Adaptive Synthetic' (ADASYN) to oversample the ...
Vincent's user avatar
4votes
0answers
69views

How do you know that your classifier is suffering from class imbalance?

Inspired by @Dave's question "Why does data science see class imbalance as a problem for supervised learning when statistics does not?", I am re-posting a question I posed on the stats SE to ...
Dikran Marsupial's user avatar
6votes
3answers
296views

Reproducible examples where balancing the training data demonstrably improves accuracy

I asked this question on the Statistics SE, but there were no answers, even when a modest bonus was available, so I am asking here to see if any examples can be given. I have been looking into the ...
Dikran Marsupial's user avatar
1vote
1answer
1kviews

I used SMOTE-ENN to balance my dataset and it improved the performance metrics, but how can I be sure it's not overfitting?

The models were evaluated using 10-fold cross validation. foldCount = StratifiedKFold(10, shuffle=True, random_state=1) The models in question are XGBoost. ...
Tariq's user avatar
2votes
2answers
2kviews

How to calculate accuracy of an imbalanced dataset

I like to understand what is the accuracy of an imbalanced dataset. Let's suppose we have a medical dataset and we want to predict the disease among the patients. Say, in an existing dataset 95% of ...
Encipher's user avatar
0votes
1answer
94views

Do I need to use AUPRC for reporting classification results on an imbalanced dataset when the model was trained using upsampling and CV

I am working on a binary classification problem which dataset has about 5% of positive class samples. I split the dataset, 70% for training and 30% for testing. I used the test data only once for ...
Paul's user avatar
0votes
1answer
130views

How to effectively evaluate a model with highly imbalanced and limited dataset

Most data imbalance questions on this stack have been asking How to learn a better model, but I tend to think one other problem is How do we define "better" (i.e. fairly evaluate the learned ...
jasperhyp's user avatar
1vote
1answer
431views

Class imbalance: Will transforming multi-label (aka multi-task) to multi-class problem help?

I noticed this and this questions, but my problem is more about class imbalance. So now I have, say, 1000 targets and some input samples (with some feature vectors). Each input sample can have label ...
jasperhyp's user avatar
0votes
1answer
78views

Give more weight to features based on distribution plot

I have a task to predict a binary variable purchase, their dataset is strongly imbalanced (10:100) and the models I have tried so far (mostly ensemble) fail. In ...
robsanna's user avatar
0votes
1answer
78views

Over-sampling when predicting a contionuous variable

Let's say I am predicting house selling prices (continuous) and therefore have multiple independent variables (numerical and categorical). Is it common practice to balance the dataset when the ...
Kev's user avatar
0votes
1answer
249views

Explaining the logic behind the pipe_line method for cross-validation of imbalance datasets

Reading the following article: https://kiwidamien.github.io/how-to-do-cross-validation-when-upsampling-data.html There is an explanation of how to use ...
PwNzDust's user avatar
0votes
1answer
2kviews

Handling Imbalanced Datasets in Orange

I work in the medical domain, so class imbalance is the rule and not the exception. While I know Python has packages for class imbalance, I don't see an option in Orange for e.g. a SMOTE widget. I ...
Bob Hoyt's user avatar
3votes
1answer
829views

What does IBA mean in imblearn classification report?

imblearn is a python library for handling imbalanced data. A code for generating classification report is given below. ...
codeczar's user avatar
2votes
1answer
3kviews

Using SMOTENC in a pipeline

I am trying to figure out the appropriate way to build a pipeline to train a model which includes using the SMOTENC algorithm: Given that the N-Nearest Neighbors algorithm and Euclidian distance are ...
thereandhere1's user avatar
1vote
2answers
808views

Cross validation schema for imbalanced dataset

Based on a previous post, I understand the need to ensure that the validation folds during the CV process have the same imbalanced distribution as the original dataset when training a binary ...
thereandhere1's user avatar

153050per page
close